Code
::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE) knitr
Marks and Channels
Heidi Sellmann
February 8, 2024
Pig | IL1B_Jej_pg_mL | IL1B_Ile_pg_mL | IL1B_Col_pg_mL | TNFA_Jej_pg_mL | TNFA_Ile_pg_mL | TNFA_Col_pg_mL | IL8_Jej_pg_mL | IL8_Ile_pg_mL | IL8_Col_pg_mL |
---|---|---|---|---|---|---|---|---|---|
P1 | 158.107 | 258.529 | 18.902 | 10.084 | 191.685 | 67.410 | 2717.095 | 3244.064 | 48.389 |
C1 | 304.570 | 300.510 | 173.388 | 5.293 | 148.321 | 53.500 | 1807.191 | 2747.520 | 521.019 |
HM1 | 126.749 | 92.684 | 80.020 | 0.000 | 163.895 | 0.000 | 2163.072 | 1842.670 | 93.936 |
HM2 | 90.214 | 72.580 | 75.545 | 0.000 | 130.676 | 0.000 | 3434.170 | 3433.543 | 194.084 |
HM3 | 114.224 | 79.751 | 239.901 | 0.000 | 0.000 | 28.050 | 1649.797 | 1678.069 | 334.678 |
HM4 | 132.318 | 107.964 | 155.655 | 3.612 | 5.265 | 18.105 | 3390.124 | 3088.125 | 373.427 |
HM5 | 68.657 | 49.695 | 12.045 | 0.000 | 112.351 | 0.000 | 3148.633 | 3153.588 | 172.843 |
HM6 | 112.428 | 57.860 | 52.582 | 0.000 | 0.000 | 0.000 | 3088.206 | 4084.593 | 183.157 |
P2 | 194.675 | 145.165 | 63.666 | 12.729 | 18.287 | 89.610 | 2474.794 | 2692.826 | 880.425 |
C2 | 401.575 | 564.465 | 305.068 | 42.442 | 17.261 | 84.509 | 1635.162 | 1030.576 | 255.938 |
IF1 | 174.037 | 130.035 | 38.956 | 0.000 | 105.243 | 3.978 | 2013.397 | 1678.351 | 49.232 |
IF2 | 126.663 | 115.666 | 76.857 | 0.000 | 63.294 | 4.975 | 3399.788 | 1727.786 | 116.573 |
IF3 | 141.120 | 53.064 | 182.891 | 4.574 | 0.000 | 12.948 | 3766.222 | 504.454 | 38.554 |
IF4 | 132.275 | 78.283 | 18.266 | 9.628 | 154.257 | 0.000 | 3705.406 | 2293.403 | 222.193 |
IF5 | 111.800 | 91.316 | 82.741 | 0.000 | 223.890 | 4.293 | 3228.713 | 2499.064 | 103.378 |
IF6 | 159.411 | 120.352 | 31.099 | 0.000 | 5.171 | 0.000 | 2001.489 | 2659.008 | 173.974 |
I am adding in my DATA DICTIONARY from Assignment 2:
Just one excel sheet with items and attributes.
In Cytokine_summary, there are 16 pigs. P1 and P2 were pilot pigs fed piglet milk replacer formula. C1 and C2 were farm control pigs (siblings) raised at a farm (the same one as the other pigs), feeding from their own mom, and then we received them for necropsy on day of life (DOL) 28. HM1-6 were fed human milk for 28 days in our lab. IF1-6 were fed infant formula for 28 days in our lab. Pairs of HM and IF (such as HM1 and IF1) were siblings and both raised at the same time, but in different cages.
In Cytokine_summary, there are 10 variables. One of these columns = “Pigs” and specifies the observations described above. All other variables are cytokine values from ELISAs on various intestinal tissues harvested fromt the pigs at necropsy on DOL 28 (except HM/IF5- they had to stay with us a little longer). Detected ELISAs tested included IL1B, TNFA, and IL8. Each of these cytokines were tested on the jejunum (Jej), ileum (Ile), and Colon (Col) of each pig. Concentration units of for each measurement were in pg/mL.
Now that I have the necessary data and packages, I want to make a box blot distribution of my various cytokines per pig feeding group - HM, IF, P, and C. Barrie helped me with this!
Here we add another column into Cytokine_summary via R
Figure 1. Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokine IL1B in the jejunum. Jitter overlay is representative of each individual pig.
C = Farm Control
HM = HM
IF = IF
P = Lab Control
Now I could create individual ones of these for each column… but we are going to try to work smarter, not harder, and create a new data frame to get all these types of plots into 1 figure.
First we break up the columns 2-10 to their cytokine and their region.
Cytokine_long looks good!
Now to put that into a boxplot with jitter overlay. We facet_wrapped in order to make subplots from 1 plot (slice it up for the viewers).
Figure 2. Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokines IL1B, IL8, and TNFA in the jejunum, ileum, and colon. Jitter overlay is representative of each individual pig.
Great!
Moving forward from Figure 2., we wanted to see if there were any individual pigs driving differences.
Make new data frame for Cytokine z-scores
Make a boxplot now:
Figure 3. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines.
Super cool! Each pig has 9 dots = 9 cytokines readings (3 cytokines and 3 regions). Doesn’t appear to be any real outlier pigs as a whole (i.e. none are extremely inflamed or non-inflamed for any measure). This shows us there doesn’t appear to be hidden structure in my data.
ACTION = SEARCH
TARGET = ALL DATA
What if I add the CHANNEL of another dimension of FILL/COLOR to Figure 3?
Figure 4. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Actually, this might be quite helpful! Hmm… different colors?
Figure 5. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Helpful or a hindrance, I don’t know!
How about changing the MARK of SHAPE?
Figure 6. Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Alright… I am having too much fun here. Lastly, for my micro-pig-ome data, Barrie and I worked on relative abundance stacked barcharts. I won’t go into it too much, but the data I am importing below are microbiome samples (from piglet fecal samples) representative of various timepoints. I want to manipulate the CHANNEL of COLOR to be a little more discriminating.
The following was copied from my pre_decontam project:
Figure 7. Piglet microbiome relative abundances with poor color
I am going to manipulate color now!
Figure 7. Piglet microbiome relative abundances with better color!
Show by Figures 3, 4, 5, and 6.
Figure 3 and 5 = best, 4 and 6 = meh!
Figure 4 fine-tuned Figure 3 by GI region.
However, Figure 5 tried to add in more colors, and ultimately, this just added to the cognitive load. Not sure how to use Region helpfully without overstimulating/confusing.
Head to Figure 7. Gut microbiome stacked barcharts are a little easier to distinguish with the updated color scheme.
We were searching for popout in Figure 3. I do notice C2 is a bit of a bigger box compared to the rest.
Thanks! TTFN!
---
title: "BCB 520 Assignment 4"
subtitle: "Marks and Channels"
author: "Heidi Sellmann"
date: "2024-02-08"
categories: [Assignments, Data Viz]
image: "cytoswine.jpg"
code-fold: true
code-tools: true
description: "My Cytoswine and micro-pig-ome data"
format: html
editor: visual
---
# Recalling Assignment 3- Task Abstraction
```{r}
knitr::opts_chunk$set(echo=FALSE, warning=FALSE, error=FALSE, message=FALSE)
```
## Import Data
```{r Import Data}
library(readxl)
Cytokine_summary <- read_excel("Cytokine_summary.xlsx")
View(Cytokine_summary)
knitr::kable(Cytokine_summary)
```
I am adding in my **DATA DICTIONARY** from Assignment 2:
## Data Dictionary:
### Flat Table
Just one excel sheet with **items** and **attributes**.
### Items (rows) = R studio calls these observations.
In Cytokine_summary, there are 16 pigs. P1 and P2 were pilot pigs fed piglet milk replacer formula. C1 and C2 were farm control pigs (siblings) raised at a farm (the same one as the other pigs), feeding from their own mom, and then we received them for necropsy on day of life (DOL) 28. HM1-6 were fed human milk for 28 days in our lab. IF1-6 were fed infant formula for 28 days in our lab. Pairs of HM and IF (such as HM1 and IF1) were siblings and both raised at the same time, but in different cages.
### Attributes (columns) = R studio calls these variables.
In Cytokine_summary, there are 10 variables. One of these columns = "Pigs" and specifies the observations described above. All other variables are cytokine values from ELISAs on various intestinal tissues harvested fromt the pigs at necropsy on DOL 28 (except HM/IF5- they had to stay with us a little longer). Detected ELISAs tested included IL1B, TNFA, and IL8. Each of these cytokines were tested on the jejunum (Jej), ileum (Ile), and Colon (Col) of each pig. Concentration units of for each measurement were in pg/mL.
## Load Libraries
```{r Load libraries, message=FALSE}
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(rstatix)
library(dplyr)
```
Now that I have the necessary data and packages, I want to make a box blot distribution of my various cytokines per pig feeding group - HM, IF, P, and C. Barrie helped me with this!
## Organize the Data
Here we add another column into Cytokine_summary via R
```{r Add in diet column}
Cytokine_summary <- Cytokine_summary %>%
mutate(Diet = case_when(
grepl("^P", Pig) ~ "P",
grepl("^C", Pig) ~ "C",
grepl("^HM", Pig) ~ "HM",
grepl("^IF", Pig) ~ "IF",
TRUE ~ NA_character_ # Add this line to handle other cases or set default value
))
```
## What does each diet look like?
```{r Individual distributions}
ggplot(Cytokine_summary, aes(x = Diet, y = IL1B_Jej_pg_mL)) +
geom_boxplot() +
geom_jitter()
```
**Figure 1.** Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokine IL1B in the jejunum. Jitter overlay is representative of each individual pig.
### Legend:
C = Farm Control
HM = HM
IF = IF
P = Lab Control
Now I could create individual ones of these for each column... but we are going to try to work smarter, not harder, and create a new data frame to get all these types of plots into 1 figure.
## Creating 9 plots in 1 figure
First we break up the columns 2-10 to their cytokine and their region.
```{r Making new data frame Cytokine_long, include=FALSE}
Cytokine_long <- Cytokine_summary %>%
pivot_longer(cols = 2:10, names_to ="Reg_Cyt",
values_to = "Expression") %>%
separate(Reg_Cyt, into = c("Cytokine", "Region"), sep = "_", remove = FALSE)
```
Cytokine_long looks good!
Now to put that into a boxplot with jitter overlay. We facet_wrapped in order to make subplots from 1 plot (slice it up for the viewers).
```{r Cytokine_long into boxplot/jitter plot}
ggplot(Cytokine_long, aes(x = Diet, y = Expression)) +
geom_boxplot() +
geom_jitter() +
facet_wrap(Region ~ Cytokine, scales = "free_y")
```
```{r Save 9 in 1}
ggsave("CytokineSummaryBoxplotJitter.pdf")
```
**Figure 2.** Box and whisker plots displaying piglet diet group differences in expression levels (pg/mL) of the pro-inflammatory cytokines IL1B, IL8, and TNFA in the jejunum, ileum, and colon. Jitter overlay is representative of each individual pig.
Great!
# Onto Assignment 4- Manipulating Marks and Channels
Moving forward from Figure 2., we wanted to see if there were any individual pigs driving differences.
## Plot looking at individual pigs and cytokine z scores
Make new data frame for Cytokine z-scores
```{r Data frame for Cytokine z-scores}
Cytokine_zscore <- Cytokine_summary %>% mutate(across(.cols = 2:10, .fns = ~scale(.) %>% as.vector)) #scaling columns 2-10}
```
Make a boxplot now:
```{r Boxplot with pig by zscore, warning=FALSE}
Cytokine_zscore_long <- Cytokine_zscore %>% pivot_longer(cols = 2:10, names_to ="Reg_Cyt", values_to = "Expression_Z_Scores") %>% separate(Reg_Cyt, into = c("Cytokine", "Region"), sep = "_", remove = FALSE) # We just made the data long! First we break up the columns 2-10 to their cytokine and their region. # Don't worry about warning- says I have lots of _}
```
```{r Boxplot of zscores with long data}
ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores)) + geom_boxplot() + geom_jitter(aes(color= Cytokine))
```
```{r Save Zscore cyotkines simple}
ggsave("ZscoreCytokinesSimple.pdf")
```
**Figure 3.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines.
Super cool! Each pig has 9 dots = 9 cytokines readings (3 cytokines and 3 regions). Doesn't appear to be any real outlier pigs as a whole (i.e. none are extremely inflamed or non-inflamed for any measure). This shows us there doesn't appear to be hidden structure in my data.
**ACTION** = SEARCH
**TARGET** = ALL DATA
## Now for manipulating...
What if I add the **CHANNEL** of another dimension of **FILL/COLOR** to Figure 3?
```{r Boxplot filled}
ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) +
geom_boxplot() +
geom_jitter(aes(color= Cytokine))
```
**Figure 4.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Actually, this might be quite helpful! Hmm... different colors?
```{r Boxplot filled better colors}
region_colors <- c("black", "gray", "white")
ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) +
geom_boxplot() +
geom_jitter(aes(color = Cytokine)) +
scale_fill_manual(values = region_colors)
```
```{r Save Zscore cyotkines complex}
ggsave("ZscoreCytokinesComplex.png")
```
**Figure 5.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Helpful or a hindrance, I don't know!
How about changing the **MARK** of **SHAPE**?
```{r Boxplot with triangle jitter}
region_colors <- c("orange", "purple", "yellow")
ggplot(Cytokine_zscore_long, aes(x = Pig, y = Expression_Z_Scores, fill = Region)) +
geom_boxplot() +
geom_jitter(aes(color = Cytokine), shape = 17) + # Change shape to the desired value
scale_fill_manual(values = region_colors)
```
**Figure 6.** Box and whisker plots of individual pigs and their overall cytokine expression z scores. Colors represent respective cytokines and respective regions.
Alright... I am having too much fun here. Lastly, for my micro-pig-ome data, Barrie and I worked on relative abundance stacked barcharts. I won't go into it too much, but the data I am importing below are microbiome samples (from piglet fecal samples) representative of various timepoints. I want to manipulate the **CHANNEL** of **COLOR** to be a little more discriminating.
The following was copied from my pre_decontam project:
```{r Playing around with microbiome, message=FALSE}
library(ggplot2)
library(magrittr)
library(dplyr)
# Reading in csv I sent to Barrie
test <- read.csv("test.csv")
ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus))+
geom_col(position = "stack")+
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 4))
# get barchart, default colors, but would love to have x axis ordered. Barrie also modified x axis to be more readable.
test <- test %>%
mutate(Sample_ID = factor(Sample_ID, levels = unique(Sample_ID[order(typeSample)]))) #order x axis by typeSample
ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus)) +
geom_col(position = "stack")+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0, size = 4))+
geom_text(aes(x = Sample_ID, y = Inf, label = typeSample, color = typeSample),
vjust = 0, angle = 90, hjust = 1, size = 1) #add this line for adding text on top/color/legend, I think
```
**Figure 7.** Piglet microbiome relative abundances with poor color
I am going to manipulate color now!
```{r Barrie Microbiome + Janet Colors}
library(RColorBrewer)
custom_col15 <- c( "#FF0000", "#00B0F0", "#FFFF00", "#96D050", "#CC3399",
"#375623", "#FFC000", "#0070C0", "#990033","#00B050",
"#FF00FF", "#66FF99", "#F96E05", "#FFFF99", "#000000")#,
# "#0000FF", "#FF7C80", "#CC66FF", "#00FF00", "#002060")
ggplot(data = test, aes(x = Sample_ID, y = Abundance, fill = Genus)) +
geom_col(position = "stack")+
scale_fill_manual(values = custom_col15)+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0, size = 4))+
geom_text(aes(x = Sample_ID, y = Inf, label = typeSample, color = typeSample),
vjust = 0, angle = 90, hjust = 1, size = 1) #add this line for adding text on top/color/legend, I think
```
**Figure 7.** Piglet microbiome relative abundances with better color!
# In summary
## Expressiveness and Effectiveness
Show by Figures 3, 4, 5, and 6.
Figure 3 and 5 = best, 4 and 6 = meh!
## Discriminability
Figure 4 fine-tuned Figure 3 by GI region.
However, Figure 5 tried to add in more colors, and ultimately, this just added to the cognitive load. Not sure how to use Region helpfully without overstimulating/confusing.
## Separability
Head to Figure 7. Gut microbiome stacked barcharts are a little easier to distinguish with the updated color scheme.
## Popout
We were searching for popout in Figure 3. I do notice C2 is a bit of a bigger box compared to the rest.
Thanks! TTFN!